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This research examines the simultaneous influences of height and 
weight on longitudinally measured systolic and diastolic blood pres- 
sure in children. Previous studies have shown that both height and 
weight are positively associated with blood pressure. In children, how- 
ever, the concurrent increases of height and weight have made it all 
but impossible to discern the effect of height from that of weight. To 
better understand these influences, we propose to examine the joint 
effect of height and weight on blood pressure. Bivariate thin plate 
spline surfaces are used to accommodate the potentially nonlinear ef- 
fects as well as the interaction between height and weight. Moreover, 
we consider a joint model for paired blood pressure measures, that is, 
systolic and diastolic blood pressure, to account for the underlying 
correlation between the two measures within the same individual. The 
bivariate spline surfaces are allowed to vary across different groups of 
interest. We have developed related model fitting and inference pro- 
cedures. The proposed method is used to analyze data from a real 
clinical investigation. 

1. Introduction. Excess weight gain has long been recognized as a risk 
factor for metabolic and cardiovascular disorders, including hypertension. 
Population studies have shown that weight strongly predicts blood pressure 
[Huang et al. (1998), Masuo et al. (2000)], although the relationship be- 
tween the two may not be linear [Hall et al. (2010)]. Data from children and 
young adults are equally persuasive on the weight-blood pressure association 
[Stray- Pedersen et al. (2009), Levin et al. (2010)]. In fact, the observations 
are so consistent that some even question whether obesity and hyperten- 
sion are two epidemics or one [Davy and Hall (2004)]. More recently, an 
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increasing number of studies have recognized a similarly salient relation- 
ship between height and blood pressure [Shankar et al. (2005), Fujita et al. 
(2010)]. Although this latter relationship appears to have a seemingly plau- 
sible physiological interpretation (taller individuals need greater pressure to 
maintain oxygenated blood flow to the head and upper extremities), the 
concurrent increases of height and weight have nonetheless made it analyti- 
cally difficult to discern the effect of weight from that of height. The matter 
is further complicated by the observed significantly positive effect of body 
mass index (BMI, defined as Weight/Height^ ) on blood pressure, as shown 
in numerous studies [Lauer and Clarke (1989), Baker, Olsen and Sorensen 
(2007), Falkner (2010)]. In fact, overweight and obesity, clinically defined by 
BMI cutoff points, have been recognized as major risk factors for hyperten- 
sion, and, indeed, hypertension prevalence is much higher in overweight and 
obese children [Falkner et al. (2006), Steinberger et al. (2009)]. Therefore, 
the scientific community has a great interest to elucidate the independent 
influences of height and weight on blood pressure, as they may implicate 
different pathophysiology for this etiologically less understood disease. For 
example, an obesity mediated blood pressure elevation would implicate a 
more activated sympathetic nervous system (perhaps stimulated by adipose- 
derived hormones such as leptin), and increased sodium reabsorption by the 
kidney [Hall et al. (2010)], whereas a strong height influence could give more 
credence to the notion that the disease has its origin in growth [Lever and 
Harrap (1992)]. 

In this research, we assess the simultaneous influences of height and weight 
on blood pressure using prospectively collected data from a cohort of healthy 
children. Such an exercise, however, faces a number of methodological chal- 
lenges: (1) Outcomes are repeatedly measured paired observations. Blood 
pressure consists of two readings, a systolic measurement taken during the 
contraction phase of the cardiac cycle and a diastolic measurement taken 
during the recoil phase of the cardiac cycle. Together, they represent the 
pressure exerted by the circulating blood on the walls of blood vessels during 
two different phases of the same cardiac cycle. For longitudinally collected 
blood pressure measurements, separate modeling of the systolic and diastolic 
outcomes may not be appropriate, as the two are biologically correlated and 
mutually influential [Guo and Carlin (2004)]. (2) Height and weight effects 
on blood pressure may be nonlinear. Previous research has recognized a 
nonlinear pattern of the adiposity effects on blood pressure [Hall (2003)]. 
More recent data indicate a nonlinear height effect on blood pressure as 
well [Tu et al. (2009)]. Preliminary data from this investigation also indi- 
cate nonlinear height and weight effects on blood pressure; see Figure 1 in 
Section 5. Furthermore, interaction of height and weight may exist. To ac- 
commodate, nonlinear bivariate effect surfaces will have to be incorporated 
into the model for the purpose of depicting the joint height-weight effects 
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on blood pressure. (3) Inferences are needed for comparing the joint height- 
weight effects across different gender and ethnicity groups. Such comparisons 
are of great interest, as recent reports indicate significantly higher risks of 
obesity and hypertension in black children than in white children [Anderson 
and Whitaker (2009), Brady et al. (2010)]. Differentially expressed bivariate 
effect surfaces, therefore, may well point to different pathophysiology of the 
disease among people of different ethnic backgrounds. 

To address these challenges, we propose a joint semiparametric mixed 
effects model that includes two individual components, one for systolic and 
the other for diastolic blood pressure measures. Bivariate smooth functions 
for the joint height-weight effects are embedded in these components to 
account for possible nonlinear effects as well as interactions between the two 
independent variables. The two components are then connected by shared 
random subject effects in a unified regression framework. 

Semiparametric regression as a practical data analytical tool has experi- 
enced tremendous growth in the past ten years, especially since the publica- 
tion of the book Semiparametric Regression by Ruppert, Wand and Carroll 
(2003). Methodological extensions, stimulated by exciting applications, and 
new computational approaches, have now covered most commonly encoun- 
tered data situations. He, Fung and Zhu (2005) considered robust generalized 
estimating equations (GEE) for analyzing longitudinal data in generalized 
partial linear models. Lin and Carroll (2006) proposed profile kernel and 
backfitting estimation methods for a class of semiparametric problems. Mod- 
els with bivariate smoothing have been developed for geospatial [Sain et al. 

(2006) , Guillas and Lai (2010)] and medical imaging applications [Brezger, 
Fahrmeir and Hennerfeind (2007)]. Penalized splines have been used to 
analyze longitudinally measured event counts [Dean, Nathoo and Nielsen 

(2007) ]. Crainiceanu, Diggle and Rowlingson (2008) have proposed penal- 
ized bivariate splines for binary response and developed related Bayesian 
inference procedures. More recently, Ghosh and Tu (2009) have developed 
a joint semiparametric structure for zero-infiated counts that consists of a 
logistic model for the proportion of zeros and a log-linear model for Poisson 
counts; both models contain univariate nonparametric components. Ghosh 
and Hanson (2010) studied a semiparametric model for multivariate longi- 
tudinal data in a Bayesian framework. Although there is a rich literature on 
semiparametric analysis of longitudinal (or clustered) data, not much has 
been developed for analyzing bivariate joint effects of two continuous inde- 
pendent variables (height and weight in this application) on a pair of closely 
related outcome variables (e.g., systolic and diastolic blood pressure). The 
current research extends the previous work by introducing group-specific 
bivariate smooth components into the joint modeling of paired outcomes. 
Related model fitting and inference procedures are also developed. 



4 



H. LIU AND W. TU 



The outline of this paper is as follows. We introduce the semiparametric 
mixed model for paired outcomes and its estimation in Section 2. Hypothesis 
testing for group differences in the bivariate effect surfaces is discussed in 
Section 3, followed by a Monte Carlo study in Section 4. As the motivation 
and illustration of the proposed methods, a real data application of child- 
hood blood pressure study is presented in Section 5. We conclude the paper 
with a few methodological and scientific remarks in Section 6. Additional 
details on model-fitting algorithms and model diagnostics are provided in 
the supplementary materials [Liu and Tu (2012)]. 

2. Methods. 



(2.1) 



2.1. A semiparametric mixed model for paired outcomes. We introduce 
our model in a more generic setting. Let Yij = {Y^p ^Y^^^y be a pair of 
outcomes from the zth subject measured at the jth visit, where j = 1, . . . , 
and i = 1, . . . , m. Assuming that we have g = 1, . . . , G groups of interest, 
let Zig be a binary group indicator for the ith subject: Zig — 1 if the ith 
subject belongs to the gth group, Zig = otherwise. We propose the following 
semiparametric mixed effects model for the paired outcomes: 

Y^ = + tjj^p, + f2 f^'\w,„h,,)z.g + e«, 

9=1 
G 

9=1 

where Uj = {U^^\uj;'^^)'^ is the random subject effect vector; tjj denotes 
the time- varying covariate vector of the ith subject at visit j whose effects 
are assumed to be parametric with corresponding parameter vectors il^i 
and ■02) the joint infiuences of two continuous independent variables Wij 
and hij in each group are modeled by the group-specific bivariate smooth 

functions fg^^ and fg"^^ for the paired outcomes respectively; e^-j^ and e^J^ 
are random errors. Herein, we let fi^^^ =tfj'tpi + Yl'^=ifg^\'^ij^f^ij)^ig ^n*^ 
fi^J^ = tfjip2 + S^=i fg^\wij , hij) Zig to denote the mean responses. 

The regression equations of the paired outcomes in (2.1) are connected via 
the shared random effect vector Uj , which is used to account for not only the 
correlation among the repeated measurements from the same subject, but 
also the correlation between the two outcome variables. The subject-specific 
random effect is assumed to be independently normally distributed, that is, 
Uj ~ AA(0, St(), with variance-covariance matrix 



(2.2) 
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where af and a 2 are two variance components, and p is the correlation coeffi- 
cient of the random subject effects of the paired outcomes and Y-^j'^ from 
the same subject measured at one visit. The correlation structure allows us 
to jointly examine the two closely related clinical outcomes in a unified mod- 
eling framework. The within-subject random errors associated with 1^^-^^ and 

Y^P , namely, e^j^ and e['^\ are assumed to follow two independent stochas- 
tic processes that could possibly lead to serial correlation within each error 
series. In general, a wide range of correlation structures could be embedded 
into the errors by assuming, for example, coi{efj ,e[''j,) = q{d{sij,Siji),(j)), 
1 = 1,2, where q is a correlation function taking values between —1 and 1, 
d is some distance measure of the two position vectors Sjj and Sjj/ associ- 
ated with e,-'^ and e,-2, respectively, and cb denotes a vector of correlation 
parameters. Without loss of generality, in the following discussion on model 
estimation we assume simple independent errors with normal distribution 
(ej-j\ e^-^^)^ ~ A/'(0, Se), where = cr^ diag(l, 5^) with the dispersion pa- 
rameter (T^ and a relative scale parameter 6 for modeling heteroscedasticity 
of the random errors associated with the two outcomes. More general error 
processes including autoregressive moving-average (ARMA) and continuous 
time autoregressive structures can be introduced into the current modeling 
framework, which will be revisited at the end of Section 2.3. 

In (2.1) we assume a compound symmetry covariance for the random 
subject effects, which implies that the correlation between the two random 
effects corresponding to the paired outcomes remains constant over time. If 
necessary, serial correlations among repeated measures from the same indi- 
vidual could be specified by the within-subject error structure. In this study, 
we are primary interested in the effect of somatic growth on blood pressure 
development in children, both of which increase with age. Hence, we opt not 
to explicitly introduce temporal effects into the model. In applications where 
time trajectories are of primary interest, time-varying correlation structures 
can be incorporated if constant correlation assumption is not adequate. See 
Morris et al. (2001) and Dubin and Miiller (2005) for related discussion on 
the modeling of varying correlations between random functions. 

2.2. Nonparametric smooth functions. In the proposed semiparametric 
mixed model (2.1), the group-specific bivariate smooth functions are repre- 
sented by linear combinations of some generic basis functions as follows: 

K K 

fg^Hw,h) = ^bk{w,h)l3gk, ff\w,h) = ^Ck{w,h)-fgk, 

k=0 k=0 

where bo = cq = 1, bk{w,h) and Ck{w,h),k = 1, . . . ,K , are the basis functions 
of any bivariate smoother; (3g = (/3go, • • .,/3gKV, 7^ = (7^0, • • .,^gKV,g = 
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1, . . . , G, are regression coefficients associated witli the nonparametric com- 
ponents in each group. Note that (/3io, . . . , /3go) and (710, • • • i 7go) repre- 
sent, respectively, the population average of the paired outcomes in different 
groups. In this research, we choose to use a thin plate spline as the bivariate 
smoother [Wood (2003)], which is computationally more convenient for mod- 
eling high-dimensional nonparametric functions. The thin plate regression 
splines in a semiparametric model can be estimated via the penalized likeli- 
hood approach [Green and Silverman (1994)]. The penalized log-likelihood 
function of model (2.1) can be written as 

G G 

^^ = ^_^A,J(/«)-j;v^,J(/f ), 
9=1 9=1 

where £ is the log-likelihood function, and J is the roughness penalty func- 
tional for a bivariate twice-differentiable function /. Here we write the rough- 
ness penalty J of a generic function f{xi,X2) in the following form: 



As in the unidimensional case, with the observed data, one could express 
the roughness penalty as quadratic forms of the regression coefficient vectors, 
that is, J(/i')) = P^A^gf3g/2 and J{fP) = where and A^ are 

positive semi-definite penalty matrices. The nonnegative smoothing param- 
eters A = (Ai, . . . , Xg)^ and (p = {ipi, . . . , ifc)'^ control the trade-off between 
goodness of fit and model smoothness. The roughness penalties could be fur- 
ther written into a more condensed form (/3^A^/3-|-7"^A^7)/2, where A^ = 

diag(AiA^, . . . , A^A^) and A^ = diag(93iA]^, . . . , (yj^A^) are block-diagonal 
penalty matrices corresponding to the two outcomes. 

With the observed values of the independent variables {wij,hij}i<j<m;i<i<r 
we write the smooth terms (including the intercepts) into a matrix form 



dxl 



■ G 



■ G 
.9=1 



where 



Ziibk{wij,hij) 

0<k<K 



f3 = {(3i,...,(3'a 



X, 



ZiiCk{wij , hij) 

0<k<K 



ZiGbk{wij,hij) 

0<k<K 



ZiGCkiwij,hij) 

0<k<K 



:X^7, 



l<j<ni;l<i<m 



SEMIPARAMETRIC MODEL FOR PAIRED LONGITUDINAL OUTCOMES 7 



and 

7 = (7^,•••,7G)'^• 
Therefore, estimation of the nonparametric bivariate smooth functions can 
be achieved through penahzed estimation procedure of the corresponding 
regression coefficients. The smoothing parameters in semiparametric regres- 
sion models can be determined by, for example, generalized cross-validation 
(GCV) or maximum likelihood (ML) approaches [Wahba (1985)], among 
other methods. In the next section we discuss estimation of the nonpara- 
metric components in the proposed semiparametric mixed model, based on 
the ML method. 

2.3. Mixed model presentation and estimation. Let Yi = {Y^^^,..., 

Yi'lj^, Y2 = (y/j\...,y4'L)^, Y = (Yf,Yj)^ be the response vari- 
able vectors, and N = Yl^i ''^i the number of total observations. We de- 
note Ui = (C/f \ . . . , CZ^y , U2 = (C/f \ . . . , C/i^y , U = (Uf ,Uj)^ as the 
vectors of subject-specific random effects. We write ei = (e^^], . . . , em]nm)'^ i 

€2 = {e^il, ■ • • ,em]n,n)'^ , £ = {^1,^2)'^ the random error vectors. We de- 
note the model matrix associated with the parametric components in (2.1) 
as T = I2 (8) T (I„ is the identity matrix of dimension n, (g) denotes Kro- 
necker product, and henceforth) with corresponding parameter vector -0, 
where T = [t^]i<j<ni;i<i<m and = {i/'J, 11^2)^ ■ It is straightforward to set 
up the model matrix Z = I2 CJi of the random effects U, such that the 
elements of Z^Ui corresponding to subject i are equal to U^^^ , and similarly 
for Z„U2. Then the semiparametric mixed model for paired outcomes could 
be expressed in a more condensed form as follows: 

(2.3) Y = Xi9 + ZU + e, 

where the block-diagonal matrix X = (T, diag(X^, X^)) is the model matrix 
of the fixed effects (including parametric and nonparametric components), 
parameter vector = (•j/j^,/3^,7^)^, U J\f{0, (S> Im) is the random ef- 
fects vector, and random errors e ~ A/'(0, 5]^ Cg) I^v). Model (2.3) can be fit- 
ted using the penalized maximum likelihood method with roughness penal- 
ties on the nonparametric components. Compared with the GCV approach 
for choosing smoothing parameters through penalized estimation procedure, 
ML-based methods are computationally more advantageous [Kohn, Ansley 
and Tharm (1991)]. Furthermore, under the mixed model framework, de- 
termination of the smoothing parameters can be naturally embedded in the 
model estimation procedure [Lin and Zhang (1999)]. In the remainder of 
this section we discuss the fitting algorithm of the proposed semiparametric 
mixed model in greater detail. 
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As many have noted, the penalized hkehhood approach has a natural 
connection to the mixed effects models [Ruppert, Wand and Carroll (2003), 
Wood (2006)]. Within the mixed model framework, the nonparametric smooth 
terms are treated as regular components, with the unpenalized terms as fixed 
effects and penalized terms as random effects. Because of the unpenalized 
terms (e.g., the intercepts) in the smooth components, the penalty matri- 
ces Ajs and are often singular; it is therefore necessary to separate the 
unpenalized (fixed) and penalized (random) elements in the parameter vec- 
tors P and 7 so that the penalty matrices associated with the penalized 
elements are of full-rank. Specifically, we write the parameter vectors as 
P = (/3|^,/9^)'^ and 7 = (T'piT/j)"^, with corresponding full-rank penalty ma- 
trices S/3 and on and 7^ respectively. In this formulation, we consider 
(3p, -yp as fixed effects, (3^, 7^ as random effects so that 0^ A.pf3 = 0^Spl3fi 
and 7^A^7 = 7]JS^7^ (note that the fixed effects I3p and 7^ have zero 
roughness penalty). By rewriting the model matrices of the smooth compo- 
nents as = (X^,X^) and X^ = (Xj,,X]j) and letting = (■j/>"^,/3p,7p)'^ 
be the parameters of the fixed effects, and r/ = (U^, , 7/j)"^ be the random 
effects, we are able to express the semiparametric model (2.3) as a linear 
mixed model (LMM): 

(2.4) Y = X0-hZ77 + e, 

where the block-diagonal model matrices are defined as X= (T, diag(X^, X^,)) , 
Z = (Z, diag(X^, X^)); the random effects rj ~ AA(0, S,,), and random errors 
e~A/'(0,R), with = diag(S„ (g) !„, S;^^), and R = Se(E)lAr. From 
a Bayesian perspective, under uniform and improper priors on the fixed 
effects and Gaussian priors on the random effects with variance-covariance 
matrix 5]^, the penalized likelihood estimates are simply the posterior modes. 
The variance-covariance matrices S^^ and of the random effects (3^ and 
7^ depend on the smoothing parameters A and respectively, which can 
be treated as regular variance components in the LMM. 

In this LMM framework, the semiparametric mixed model can be fitted 
using either ML or restricted maximum likelihood (REML) methods [see, 
e.g., Lin and Zhang (1999)], with the smoothing parameters treated as reg- 
ular variance component parameters. Specifically, we write e = Z77 -|- e, and 
the variance component parameter vector t = (A"^, (^^, p, 5, (t|, it|, o"^)-'". It 
then follows that (2.4) is equivalent to 

(2.5) Y = X0 + e, e~AA(0,V), 

where V = ZS^Z^ + R is a function of the variance components r. Hence, 
the likelihood function given the observed response vector y becomes 
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Model estimation of (2.5) can be achieved by maximizing the above objective 
function or the REML criterion £_r(t) = log/ L{6,t) dO. The latter has a 
closed-form expression 

^r(t) = -Hlog |V| + log |X^V-iX| + (y - X.e fV~\y - X0)}, 

where 6 = (X-^V~^X)^^X-^V^^y is the generalized least-square estimate 
of the fixed effects 6 given V. Statistical inferences concerning the model 
parameters in (2.5) can thus be conducted in this LMM framework. 

We conclude this section with a brief comment on the correlation struc- 
ture of the random errors. In the above discussion we have assumed an in- 
dependent error structure with variance-covariance matrix R = ® I^r for 
convenience of derivation. In some longitudinal applications, however, such a 
simple error structure may not be adequate. To accommodate more complex 
error processes, we can let the variance matrix take a more general form, 
for example, R = R(5, o"^, </>), where is the correlation parameter vector. 
Serial correlation structures such as the often used autoregressive-moving 
average (ARMA) can be embedded into the current model framework with 
properly defined variance matrix R. If the longitudinal measurements are 
not equally spaced due to design or missingness, a continuous time error 
process may be adopted. For example, the continuous time autoregressive 
(of order 1) structure is widely used in many applications [Jones (1993)] 

and it assumes cor{efj ,efj,) = 4>'^,l = 1,2, with d denoting the time interval 
between the two measurements and (p being the correlation parameter of 
unit time interval. See Pinheiro and Bates [(2000), Section 5.3], for detailed 
discussion on model specification of various error structures. 

3. Hypothesis testing. 

3.1. Bootstrap test. An implicit assumption of the proposed model (2.1) 
is that the nonparametric bivariate surface may interact with other indepen- 
dent variables. In other words, the joint effects of the two continuous vari- 
ables may vary across different groups. In the context of the blood pressure 
study, an important scientific question is whether the joint height-weight ef- 
fects on blood pressure differ among sex-ethnicity groups. In particular, we 
are primarily interested in testing the following hypothesis in model (2.1): 

(3.1) Ho:fi^^ = --- = f^^\ /f) = . . . = vs. /^a: otherwise. 

A likelihood ratio test could be constructed based on statistic A = 1(0, t) — 
I^OqjTq), where i{6,T) represents the value of the log-likelihood function 
evaluated at the maximum likelihood (or REML) estimates from the unre- 
stricted model and ^(^Oj^o) represents the value of log-likelihood evaluated 
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under the null hypothesis. Zhang and Lin (2003) proposed to use a scaled 
distribution to test the equivalence of two nonparametric functions in 
semiparametric additive mixed models. The test they proposed considered 
unidimensional smooth functions for two groups. It is much more difficult in 
comparing bivariate smooth functions from multiple (G > 2) groups, espe- 
cially if the supports of the bivariate functions are not entirely overlapping, 
such as, in our application, boys and girls have different ranges of height 
and weight. In the absence of theoretical development on the sampling dis- 
tribution of the likelihood-based test statistic A for paired outcomes, we 
resort to resampling techniques for the approximation of the empirical dis- 
tribution of A. Similar techniques were proposed by Roca-Pardihas et al. 
(2008) for testing of factor-by-surface interactions in a logistic generalized 
additive model (GAM). We herein extend this test to paired outcome data 
in a longitudinal setting. 

The bootstrap testing procedure that we propose is carried out through 
the following steps: 

(1) For j = 1, . . . and i = 1, . . . ,m, estimate (predict) the restricted 
mean response fl^j, random subject effect Uj and random error eij, from 
the fitted model (2.1) under the null hypothesis, where fi^j = {Jj-ij\'[j-if)'^ , 

(2) Draw a bootstrap sample of the random subject effects \Ji = {U-^\u^'^^)'^ 
from {Ui}i<i<m with replacement; 

(3) Let the bootstrap residuals be e.j^ = f-jj^^jj^ and = ^i^^^i^K where 

e[\j^ and e[J^ are i.i.d. random variables which have equal probabilities 0.5 
to be 1 or —1; 

(4) Generate a bootstrap sample of paired responses Yjj = {Y^p ,Y^p)'^ 

by = U\^^ + V^t- + and = + ^g) + g{2)^ based on the boot- 
strap samples from Steps 2 and 3; 

(5) Fit the joint model (2.1) to the bootstrap data {Yjj}i<j<„.:i<j<m 
under the null hypothesis and the unrestricted model, and calculate the 
bootstrap test statistic A*; 

(6) Repeat Steps 2-5 for 6 = 1, . . . , i? times, to obtain a bootstrap sample 
of the test statistic {^l}i<b<B-, which can be used as the nominal distribu- 
tion of the test statistic under the null hypothesis. 

The p- value of the bootstrap testing is calculated as p = #{A > A^}/i?. 
It should be noted that in Step 3, a wild bootstrap [see, e.g., Liu (1988), 
Mammen (1993)] with the Rademacher distribution is used instead of the 
original version, as the former has been shown to possess better finite-sample 
performance [Davidson and Flachaire (2008)]. This bootstrap procedure is 
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partly based on the best linear unbiased predictors (BLUPs) of the random 
effects, which may underestimate the variability in the data and lead to bi- 
ased inferences [although the results are asymptotically unbiased, see Morris 
(2002)]. In our application, with a relatively large sample size (m = 418 sub- 
jects with a median of 16 visits for each subject), the bias associated with 
the test is likely to be negligible. 

3.2. An ad hoc likelihood ratio test. Due to the large number of iterative 
fitting of complex models, the implementation of the previously proposed 
resampling-based test is computationally intensive. In this section we con- 
sider an ad hoc likelihood ratio test (LRT) based on the asymptotic 
distribution, as a computationally more efficient alternative. Writing the 
semiparametric mixed model (2.1) as a linear mixed model (LMM) as in 
(2.4), we could construct a LRT within the LMM framework for hypothe- 
sis (3.1). However, as noted by Crainiceanu and Ruppert (2004), the asymp- 
totic properties of LRT based on distributions [Self and Liang (1987)] 
are not always satisfactory when applied to penalized splines. Whereas, if 
no roughness penalty is added to the smooth functions, statistical infer- 
ences (including significance tests) will have more reasonable behaviors for 
unpenalized models [Wood (2006), page 195], at the price of overfitting. Ad- 
ditionally, LRT for fixed effects based on standard Xu distributions (with 
the degrees of freedom being the difference of the numbers of parame- 
ters between the null and unrestricted) tends to be more "anticonservative" 
[Pinheiro and Bates (2000), see discussions on pages 87-88]. To alleviate, 
we adopt a mixture of xt ^'^d xt+i ^ reference distribution suggested 
by Stram and Lee (1994), the empirical performance of which is studied in 
Section 4. The adjusted LRT is conducted through the following steps: 

(1) Fit model (2.1) using penalized splines based on ML to obtain the 
effective degrees of freedom (EDF) for the penalized spline estimates; 

(2) Refit (2.1) under the null and unrestricted models, respectively, by 
fixing the degrees of freedom to be (approximately) the estimated EDF 
from Step 1 using the unpenalized splines; 

(3) Calculate the p-value based on 2 times the log-likelihood ratio from 
Step 2, using ^xi + ^X^+i as the nominal distribution. 

This alternative LRT procedure significantly reduces the computational 
burden of the aforementioned inference. The resampling-based testing pro- 
cedure, on the other hand, is methodologically better grounded and is more 
likely to have superior finite-sample performance. Nonetheless, despite the 
ad hoc nature of the LRT, it might be able to provide quick testing results 
with reasonable accuracy. The justification of the LRT is entirely empirical. 
To that end, we conduct a Monte Carlo study to assess the operational char- 
acteristics of the LRT (see Section 4). In practice, we recommend the use 
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of the resampling-based testing procedure whenever computing resources 
are available. In the absence of adequate computing power, the LRT may 
provide a reasonable relief, but the test results should be interpreted with 
caution, especially for borderline cases. 

4. Monte Carlo study. To assess the performance of the likelihood ratio 
test on the significance of factor-by-surface interaction in the semiparamet- 
ric mixed model, we conduct a Monte Carlo study. Simulation results are 
presented in this section. 

The simulated data are generated from two nonlinear bivariate test func- 
tions /i and /2, defined on [0,1] x [0,1]: = Sx^ + log(0.5t + 1) + 
t + 3^°■^^+^ f2{x,t) = 1.5^+1.5*3 + 2.25xe*. The two correlated outcome 

variables iY}l\Yl^^) are generated for i = 1, . . . ,m and j = 1, . . . ,n from 

f Yi? = Ul^^ + /3o + ZiPi + fi{w,„hij) + 
I y!^^ = Ui'^ + 70 + ^^7l + /2 {wij , h,j ) + eg) , 

where ([/^^\ C/^^^^)^ ~ 7V(0, as in (2.2) with ai = 2,(T2 = 3, and p = 0.5; 

(eSj\eg))^ ~7^(0,cr2diag(l,52)), with a, = 2 and 5 = 0.8; the first m/2 
subjects are labeled as group 1 and the remaining belonged to group 2, Zi 
is the group indicator variable; the true bivariate covariate effects of {w,h) 
are assumed to be homogeneous across groups with functional forms of fi 
and /2 (/ denotes corresponding smooth functions centered at the observed 
covariate values) for the two outcomes respectively; but the two groups have 
different intercepts with /3o = 10, /3i = 2,70 = 15, and 71 = 4. 

We then conducted the proposed likelihood ratio tests based on the un- 
penalized spline estimates and mixture distribution. We examined two 
levels of m = 50, 100 and two levels of n = 20, 40. The size of the test in each 
scenario was based on 500 replications and summarized in Table 1, which 
was observed to be very close to its nominal level 0.05 in each case. 

5. Analysis of blood pressure data. 

5.1. A childhood blood pressure development study. Children from lo- 
cal schools were recruited for participation in a prospective cohort study. 
Those with known cardiovascular disease, hypertension, kidney disease, and 
those on blood pressure altering medications were excluded. Blood pressure, 
height, weight and heart rate are measured semi-annually. The study is cur- 
rently ongoing. In this analysis, we use a subset of m = 418 children that 
have at least ten (>10) semi-annual assessments. The data set includes 154 
white boys (sex-ethnicity group 1, or group 1 for short and henceforth), 136 
white girls (group 2), 70 black boys (group 3) and 58 black girls (group 4). 
Figure 1 shows the marginal effects of height and weight on systolic and 
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Table 1 

Simulation results for likelihood ratio tests, with nominal level 
0.05. The results were based on 500 replications 



m 


n 


Distribution 


Size 


50 


20 


xl 


0.060 






xl+i 


0.048 






ly2 , 12 


0.052 


50 


40 


xl 


0.052 






xl+i 


0.036 






ly2 , 12 


0.042 


100 


20 




0.062 






xl+1 


0.044 






ly2 , 12 


0.054 


100 


40 


xl 


0.056 






xl+1 


0.042 






2 Ai^ ^ 2 A/^+1 


0.046 



diastolic blood pressure using scatterplot smoothing [Cleveland (1979)], for 
each of the sex-ethnicity combinations. Examining the estimated marginal 
effects, we see clear indications of nonlinear weight effect and different pat- 
terns of height and weight effects in boys and girls of different ethnicity. 

To accommodate these data features, we perform an analysis using the 
proposed semiparametric mixed model that simultaneously assesses the height 
and weight effects on systolic and diastolic blood pressure. We present bivari- 
ate effect surfaces in colored contour plots for the examination of the joint 
height-weight effects. We also perform resampling-based and LRT-based in- 
ferences for the detection of possible gender and ethnicity differences in these 
bivariate surfaces. 

5.2. Model specification. The following model is used to examine the 
joint height-weight effects on diastolic and systolic blood pressure in chil- 
dren: 



(5.1) 



DBP.j = Ut+ pijiPd + fg{wij,hij)zig + el 



9=1 

4 



SBPij = U- +Pij'4)s + ^ fg{wij,hij)zig + elj, 

9=1 



where DBPjj, SBPjj, pij, Wij and hij, respectively, represent the diastolic 
and systolic blood pressure, heart rate (or pulse, which is used as a surro- 
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Fig. 1. Marginal effects of weight and height on blood pressure using a LOWESS 
smoother. 



gate marker for cardiac output), weight and height measured from the ith 
subject at the jth visit; Uf and C// are the random subject effects; V'd and 
tps are the regression coefficient parameters of heart rate on diastohc and 
systohc blood pressure respectively; fg and fg are the unknown bivariate 
smooth functions to depict the joint weight and height effects on diastolic 
and systolic blood pressure, respectively, in the four sex~ethnicity groups 
{g = 1, ... ,4); and Zig is the corresponding group indicator. Note that the 
intercept terms (/3iO; • • • , /?4o) and (71O) ■ ■ ■ > 74o) representing, respectively, 
the population average diastolic and systolic blood pressure in different sex- 
ethnicity groups are absorbed into the corresponding group-specific smooth 
components. The effects of heart rate were found to be linear in preliminary 
analyses. Hence, they are included in the model as linear components for 
ease of clinical interpretation. The random subject effects are assumed to 
have the same distribution as specified in model (2.1). 

Since the outcomes are measured repeatedly for each subject during the 
follow-up, possible serial correlations may exist. According to the study pro- 
tocol, enrolled subjects were asked to return every six months for measure- 
ments after the baseline screening. However, the longitudinal data collec- 
tion was not exactly evenly spaced due to delayed or even missed clinic 
visits. To accommodate, we incorporate a continuous time autoregressive 
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structure into the within-subject errors. To be more exact, we assume that 
cor(ef^.,eJ,)=cor(e|^.,e|.,) = 0l"^'>^-''^'=»^'l, for 1 < j ^ f < rn, i = l,...,m, 
where age^j denotes the age (in years) of subject i at the jth visit and (j) is 
the autocorrelation parameter of unit time interval. 

The core model fitting procedure is based on the gamm (generalized addi- 
tive mixed model) routine in R package mgcv [Wood (2010)]. We have made 
necessary extensions to accommodate the complex model features (e.g., cor- 
relation structure of paired longitudinal outcomes) and visualization of the 
results. The confidence intervals of the model parameters are derived from 
the observed information matrix in the LMM framework. The estimated bi- 
variate smooth functions of weight and height in each sex-ethnicity group 
are presented and compared using colored image plots and contour lines. 
A detailed description of the model-fitting algorithms can be found in Sec- 
tion A of the supplementary materials [Liu and Tu (2012)], together with 
model diagnostics for model assumption verification and goodness-of-fit as- 
sessment in Section B. 

5.3. Analytical results. From the REML estimates (which are very close 
to the ML estimates) of the semiparametric mixed model (5.1) based on 
6867 pairs of blood pressure assessments from 418 subjects, we note a sub- 
stantial correlation (p = 0.52 with 95% confidence interval (CI) [0.42,0.61]) 
between the diastolic and systolic blood pressure within the same subject. 
Systolic blood pressure has slightly greater variability {a2 = 5.29; 95% CI: 
[4.88,5.73]) than diastolic measurements {ai =4.57; 95% CI: [4.18,4.99]), 
but the difference is not statistically significant. However, the random er- 
ror associated with systolic blood pressure (within the same subject) has 
a significantly smaller variance, as reflected by the magnitude and corre- 
sponding confidence interval of the scaling parameter (6 = 0.87; 95% CI: 
[0.85,0.90]). Such an observation is consistent with the previously published 
data on pediatric blood pressure measurements [Falkner et al. (2006)], and 
it may in part reflect the difficulty in clearly pinpointing the start of the fifth 
Korotkoff sound in diastolic measurement [Pickering et al. (2005)]. The esti- 
mated variance of the random error is = 7.39; 95% CI: [7.26, 7.53]. We also 
detected slight autocorrelation in within-subject errors, with (p = 0.014; 95% 
CI: [0.010,0.020]. The estimates of the average diastolic and systolic blood 
pressure in different gender and ethnicity groups in model (5.1) are listed 
in Table 2. Heart rate has a negative effect on the diastolic blood pressure 
with ■0^ = —0.04 [standard deviation (SD) = 0.01], whereas it is positively 
associated with systolic blood pressure ips = 0.07 (SD = 0.01). The finding is 
not surprising because heart rate directly refiects cardiac output, which typ- 
ically increases with pulse pressure (systolic minus diastolic blood pressure). 
With pulse pressure relating systolic positively and diastolic negatively, one 
would expect systolic blood pressure to increase with heart rate and diastolic 
blood pressure to decrease with heart rate. 
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Table 2 

Estimates of the parameters of the semiparametric joint blood pressure model with 95% 
lower (confidence) hound (LB) and upper (confidence) bound (UB) 



Parameter 


Sex-ethnicity group 


Estimate 


Std. Dev. 


95% LB 


95% UB 


/3io 


White male 


64.99 


0.98 


63.03 


66.95 


P20 


White female 


66.02 


0.99 


64.04 


68.00 


Pm 


Black male 


65.98 


1.09 


63.80 


68.16 


Pao 


Black female 


65.76 


1.21 


63.34 


68.18 


710 


White male 


100.41 


0.92 


98.57 


102.25 


720 


White female 


97.51 


0.93 


95.65 


99.37 


730 


Black male 


99.93 


1.05 


97.83 


102.03 


740 


Black female 


98.27 


1.17 


95.93 


100.61 



The estimated bivariate smooth functions of weight and height in the four 
sex-ethnicity groups are plotted in Figures 2 and 3. 

Both the adjusted hkehhood ratio (LR) test [2 log Li? = 217.6 with ref- 
erence distribution {xsi + xis)/^' P < 0.001] and the bootstrap test (Fig- 
ure 4 with B = 1000 replications) suggest significantly different joint height- 
weight influences on blood pressure across the four sex and ethnicity groups. 
Aside from the significant test results of the bivariate height-weight effect 
surfaces, the most interesting observation from this analysis is the appar- 
ently different shape of these bivariate functions: (1) While blood pressure 
generally increases with weight as well as height, weight clearly has a much 
greater overall influence on blood pressure. In fact, at a given weight level, 
the height effects are often minimal, as indicated by the (nearly) vertical 
contour lines. (2) Among the heavier boys (those with weight greater than 
120 kg, e.g.), blacks appear to have higher systolic blood pressure than 
whites. The reverse is true for girls. From the fitted effect surfaces, we see 
that heavier white girls appear to have higher systolic blood pressure than 
their black counterparts. (3) For diastolic blood pressure, while weight is 
still the dominant infiuence, height does have an effect. The more intriguing 
observation is perhaps the clear difference between black and white boys. For 
example, when weight is about 80 kg, taller black boys have lower diastolic 
blood pressure. While one would be attempted to attribute this to the lower 
corresponding BMI values, the opposite is true for white boys. These more 
complex pictures of height and weight influences on blood pressure point to 
possible existence of distinct physiology of blood pressure development in 
black and white children. 

6. Discussion. In summary, we have presented a joint model for paired 
outcomes with bivariate effect surfaces of two continuous independent vari- 
ables. This model extends previous work by accommodating longitudinally 
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Fig. 2. Bivariate smooth function estimates of the joint height-weight effects on diastolic 
blood pressure [defined m equation (5.1)] in boys and girls of different ethnicity. 

measured outcome pairs, as well as bivariate covariate effect surfaces. With 
the introduction of factor- by-surface interactions, it also allows for the incor- 
poration of group-specific surfaces (i.e., group-height-weight interactions). 
For implementation, we have developed necessary computational procedures. 
Resampling-based and LRT-based inferences concerning the group-specific 
bivariate effects are discussed. Simulation study indicates adequate perfor- 
mances in the proposed adjusted LRT procedure. 

Using the proposed method, we examined the influences of height and 
weight on blood pressure in children undergoing the pubertal growth pro- 
cess. For adults, there is a generally accepted notion that body weight has 
a predominant influence on blood pressure. For children that undergo the 
pubertal growth process, height and weight are known to increase concur- 
rently with age, and both height and weight are positively related to blood 
pressure. Few studies have directly examined the relative contributions of 
height and weight, partly due to the lack of appropriate analytical tools 
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Fig. 3. Bivariate smooth function estimates of the joint height-weight effects on systolic 
blood pressure [defined m equation (5.1)] in boys and girls of different ethnicity. 

to discern these simultaneous effects. With the newly developed statistical 
method, we examined the influences of height and weight on longitudinally 
measured blood pressure. We found that in children, just like in adults, 
weight tended to have a noticeably stronger impact on blood pressure, even 
during a period of vigorous linear growth. The study finding provides di- 
rect evidence that adiposity, as reflected by weight, is the primary driver 
of the blood pressure development in children. The finding is consistent 
with the latest pediatric literature on the connection between adiposity and 
blood pressure. Clinically, our finding highlights the importance of weight 
management in overweight and obese children: excessive weight gain could 
significantly increase the hypertension risk in children [Tu et al. (2011), 
Falkner and Gidding (2011)]. Mechanistically, weight-mediated blood pres- 
sure elevation in young and healthy children points to the need for studies 
focusing on including adipose-derived hormones. Animal and human stud- 
ies suggested one of such hormones, leptin, could act upon the sympathetic 
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Log-Likelihood Ratio 



Fig. 4. Empirical distribution of log-likelihood ratio based on bootstrap technique with 
B = 1000 replications. 



nervous system (SNS), which contributes to the elevation of blood pressure. 
More recently, investigators have proposed mouse models describing new sig- 
naling pathways in the pathogenesis of obesity hypertension through leptin 
[Purkayastha, Zhang and Cai (2011), Humphreys (2011)]. Interestingly, in 
a separate investigation, our research team has also discovered dramatically 
increased leptin levels and heart rate (as an indicator for a more activated 
SNS) in overweight and obese children, corresponding to the elevated blood 
pressure [Tu et al. (2011)]. The findings from the current study certainly 
gives more credence to this hypothesized pathway between adiposity and 
blood pressure. We are currently investigating the possibility of adiposity 
acting upon blood pressure through alternative pathways, such as the renin- 
angiotensin-aldosterone system. 

A limitation of this research is that we only included children who had 
ten or more longitudinal assessments in this methodological development, 
following a condition stipulated by our current data use agreement. While 
such a restriction could potentially limit the generalizability of the study 
finding, we note that the number of assessments is not related to the study 
subjects' behaviors but to the timing of their school's participation into the 
original study. As a result, children who had fewer assessments (and thus 
were excluded from the current analysis) are unlikely to be systematically 
different from those who contributed more observations. Notwithstanding 
this limitation, our research does provide an important analytical tool that 
may significantly enhance mechanistic and epidemiologic investigations con- 
cerning blood pressure development in children. 
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SUPPLEMENTARY MATERIAL 

Detailed model-fitting algorithm and model diagnostics 

(DOI: 10.1214/12-AOAS567SUPP; .pdf). We provide the computational de- 
tails of the model-fitting algorithm with sample R code and an R function 
to visualize the predicted bivariate surfaces. Some model diagnostics plots 
are also provided. 
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